Interpretation of Natural Language in an Information System

نویسنده

Hubert Lehmann

چکیده

This paper discusses some of the linguistic problems encountered during the development of the User Specialty Languages (USL) system, an information system that accepts a subset of German or English as input for query, analysis, and updating of data. The system is regarded as a model for portions of natural language that are relevant to interactions with a data base. The model provides insight into the functioning of language and the linguistic behavior of users who must communicate with a machine in order to obtain information. The aim of application independence made it necessary to approach many problems from a different angle than in most comparable systems. Rather than a full treatment of the linguistic capacity of the system, details of phenomena such as time handling, coordination, quantification, and possessive pronouns are presented. The solutions that have been implemented are described, and open questions are pointed out. Introduction During construction of the User Specialty Languages (USL) system, a number of linguistic problems were encountered; these had not been treated with sufficient detail in the literature to permit ready implementation of solutions. The solutions found for the USL system in these cases are felt to be of interest also outside the environment of data base interaction via natural language. The USL system was created to provide users with a tool for accessing and analyzing data without having to become expert in electronic data processing. It was assumed, however, that the user would be a professional knowledgeable in his field, not the casual user as described by Codd [I]; and, therefore, that the system should allow him to express himself in the terminology he was used to. The system was to be application-independent: no features dependent upon subject matter should be present in the language processing part. An independent data b se management system (DBMS) was required for the construction of the USL system, making it possible to benefit from the work done in data base research. To maintain a well-defined interface, input sentences were translated into the formal data manipulation language of the DBMS (a similar approach was also taken by Mylopoulos et al. [2], Sacerdoti [3], Waltz et al. [4], and Sibuya et al. [ 5 ] ) . A revised version of Kay’s parser [6] was used for syntactic analysis. The method of interpretation used in the REL system [7-91 was taken as a point of departure, but this method was augmented to more adequately handle coordination, quantifiers, and possessive pronouns. Hence, the principal work required for the design and implementation of the present system involved constructing a grammar for German that could be recognized by the parser and developing suitable interpretation routines that would perform the mapping from German to the data manipulation language. The USL system is comparable to question-answering systems, a number of which have been developed during the past fifteen years. Surveys of such systems can be found in [lo-121, and comparisons to USL are given in [13], while a comparison between the systems TQA (formerly REQUEST) [ 14, 151 and LSNLIS [ 161 is given in [17]. An overview of the USL system is given and similarities to other existing systems are indicated. Thereafter, the semantic concepts underlying the system are introduced in order to provide a basis for the discussions that follow. They concentrate upon four kinds of linguistic structures considered essential for a question-answering system. Without these structures, important sets of queries cannot be formulated. Temporal expressions A comparison with Bruce’s CHRONOS system [18] shows that despite his sophisticated model of time, many relatively simple but essential aspects have not been addressed. Two problems are disCopyright 1978 by International Business Machines Corporation. Copying is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title 560 and abstract may be used without further permission in computer-based and other information-service systems. Permission to republish other excerpts should be obtained from the Editor. n. LEHMANN IBM J . RES, DEVELOP. 0 VOL. 22 NO. 5 SEPTEMBER 1978 cussed: proper treatment of vagueness in temporal references, and conversion of deictic references (e.g., “last week”) to actual dates. Coordinated sentences and phrases These are among the most difficult structures in natural languages. Their treatment in this paper is not considered exhaustive, although it is hoped that a contribution has been made to the clearer understanding of the coordination phenomena. The possible interpretations of such structures are discussed, and criteria are given that determine the respective interpretations. Quantzj5ers There are still many unresolved questions concerning quantifiers; the treatment here concentrates on a discussion of the scope of quantifiers and on their interplay with particles of negation. Contextual reference This problem is addressed in the USL system only with respect to possessive pronouns. Criteria for their reference are discussed, and reasons given why a completely formal treatment is not possible at the present time. The solution found for the USL system is presented and justified. The examples used in the discussions are in English unless German and English differ in their behavior, in which case German examples with English glosses are given. System overview The USL system (the general design is shown in Fig. 1) is constructed around the relational data base management system PRTV [19], is coded in PWI, and runs under VM/ CMS in a 2500-Kbyte virtual machine. The system uses a revised form of Kay’s parser [6, 201 and a German grammar (comprising some 800 rules in a modified BackusNaur format) that was developed for the system. Each rule specifies both a syntactic configuration as a condition for its application and one or more categories that replace the original configuration after the rule has been applied. Each rule also contains reference to an interpretation routine, of which there are some 70 in the system [21]. The German grammar was later taken as a basis for the construction of English, Dutch, and Spanish grammars with the same interpretation routines as the German. A dictionary contains all relevant function words of the language whose meanings are independent of particular applications (prepositions, conjunctions, “to be,” “to have,” names of months, days of the week, etc.). Attached is an application-dependent dictionary containing all those words used to refer to relations (or tables, as explained in the section on semantic concepts). Numbers and words used to refer to objects within relations are not defined, but are recognized instead by so-called variable token rules (where patterns of letter strings are specified, and categories are assigned to strings conforming to the patterns). Morphological endings are recognized by synGrammar t

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Meaning of OF and HAVE in the USL System

This paper shows how the transformational relationship between HAVE-sentences and OF=phrases is used to represent data contained in sentences with HAVE as the main verb in the context of an information system using natural language to access a relational data base. An overview of the system first establishes the framework in which natural language processing is done. Then ways of representing H...

متن کامل

Facilitating Internalization in E-Learning Through New Information System

This paper aims to study Vygotsky’s (1987) sociocultural theory of learning with respect to how it relates to technology-based second language learning and teaching. The researchers selected their participants from advanced students from Payame Noor University. We divided the participants into two groups- an experimental group and a control group. After teaching the course an experimental group...

متن کامل

Semantic Interpretation Of Prepositions For NLP Applications

The proper interpretation of prepositions is an important issue for automatic natural language understanding. We present an approach towards PP interpretation as part of a natural language understanding system which has been successfully employed in various NLP tasks for information retrieval and question answering. Our approach is based on the so-called MultiNet paradigm, a knowledge represent...

متن کامل

Of Relating the Linguistic Description to an Interpretation of a Literary Work (Poetry)

This article attempts to see, through the structural significances of poetic language, the nature of the split between linguistic description and literary interpretation. Rhythm is the most prominent means of relating form to content in poetic language. The first account of this prominence is seen through identifying its position in the two prosodic forms of metrical and non-metrical poetry. Fo...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IBM Journal of Research and Development

دوره 22 شماره

صفحات -

تاریخ انتشار 1978

Interpretation of Natural Language in an Information System

نویسنده

چکیده

منابع مشابه

The Meaning of OF and HAVE in the USL System

Facilitating Internalization in E-Learning Through New Information System

Semantic Interpretation Of Prepositions For NLP Applications

Of Relating the Linguistic Description to an Interpretation of a Literary Work (Poetry)

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Corpus based coreference resolution for Farsi text

عنوان ژورنال:

اشتراک گذاری